A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval

نویسندگان

  • Sanjay Joshua Swamidass
  • Chloé-Agathe Azencott
  • Kenny Daily
  • Pierre Baldi
چکیده

MOTIVATION The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem. RESULTS To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix. AVAILABILITY Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/ CONTACT: [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CROC: A New Evaluation Criterion for Recommender Systems

Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely used ROC curve in recommender system evaluation by discovering performance characteristics that stan...

متن کامل

Bridging the Gap Between Neural Network and Kernel Methods: Applications to Drug Discovery

We develop a hybrid machine learning architecture, the Influence Relevance Voter (IRV), where an initial geometryor kernelbased step is followed by a feature-based step to derive the final prediction. While other implementations of the general idea are possible, we use a k-Nearest-Neighbor approach to implement the first step, and a Neural Network approach to implement the second step for a cla...

متن کامل

Collapsing ROC approach for risk prediction research on both common and rare variants

Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not y...

متن کامل

Optimizing Area Under the ROC Curve using Ranking SVMs

Area Under the ROC Curve (AUC), often used for comparing classifiers, is a widely accepted performance measure for ranking instances. Many researches have studied optimization of AUC, usually via optimizing some approximation of a ranking function. Ranking SVMs are among the better performers but their usage in the literature is typically limited to learning a total ranking from partial ranking...

متن کامل

The Drosophila fork head domain protein crocodile is required for the establishment of head structures.

The fork head (fkh) domain defines the DNA-binding region of a family of transcription factors which has been implicated in regulating cell fate decisions across species lines. We have cloned and molecularly characterized the crocodile (croc) gene which encodes a new family member from Drosophila. croc is expressed in the head anlagen of the blastoderm embryo under the control of the anterior, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 26 10  شماره 

صفحات  -

تاریخ انتشار 2010